Charakterisierung Und Sequenzanalyse Von T-dna Insertionsstrukturen Und Paralogen Bereichen Im Genom Von Arabidopsis Thaliana Basierend Auf Optimierungen Der Daten-erfassung Und -auswertung Für Die Gabi-kat-kollektion Kumulative Dissertation

نویسندگان

  • Nils Kleinbölting
  • Bernd Weisshaar
چکیده

T-DNA insertion mutants are very valuable for reverse genetics in Arabidopsis thaliana. Several projects have generated large sequence-indexed collections of T-DNA insertion lines, of which GABI-Kat is the second largest resource worldwide. User access to the collection and its Flanking Sequence Tags (FSTs) is provided by the front end SimpleSearch (http://www.GABI-Kat.de). Several significant improvements have been implemented recently. The database now relies on the TAIRv10 genome sequence and annotation dataset. All FSTs have been newly mapped using an optimized procedure that leads to improved accuracy of insertion site predictions. A fraction of the collection with weak FST yield was re-analysed by generating new FSTs. Along with newly found predictions for older sequences about 20 000 new FSTs were included in the database. Information about groups of FSTs pointing to the same insertion site that is found in several lines but is real only in a single line are included, and many problematic FST-to-line links have been corrected using new wet-lab data. SimpleSearch currently contains data from 71 000 lines with predicted insertions covering 62.5% of the 27 206 nuclear protein coding genes, and offers insertion allele-specific data from 9545 confirmed lines that are available from the Nottingham Arabidopsis Stock Centre. INTRODUCTION Since the genome sequence of the model plant Arabidopsis thaliana was completed in the year 2000 (1), the determination of gene function is a key task in the Arabidopsis research community. Insertional mutagenesis approaches have been proven to be a valuable tool for reverse genetics (2). Several large collections of A. thaliana lines containing independent insertions of Agrobacterium T-DNA in the plant genome have been established. T-DNA integration in the plant genome results in stable mutations, which may perturb gene functions. An important goal is to saturate the A. thaliana genome with T-DNA insertion lines for each (or at least most) of the 27 206 nuclear protein-coding genes that are currently annotated (3). A popular strategy to determine the position of a T-DNA insertion in the genome of a given insertion line is to generate Flanking Sequence Tags (FSTs). Using PCR-based methods, genomic DNA fragments flanking the T-DNA are amplified (4), sequenced and subsequently mapped to the genome. When this is applied to a large collection of lines, the population can easily be searched for mutants of interest. GABI-Kat is the second largest FST-indexed T-DNA insertion line collection of A. thaliana, which is publicly available since 2002 (5). User access to the collection and extensive metadata for the included mutants and alleles is provided by GABI-Kat SimpleSearch, the web interface of the corresponding database (6). The interface can either be used to search the collection for mutant alleles of interest, or more importantly to access different kinds of information about specific GABI-Kat lines. Lines containing insertions of interest can be ordered via a web order form, which is also provided in SimpleSearch. In contrast to other FST databases, GABI-Kat SimpleSearch offers information on confirmation success of lines along with sequences derived from the T-DNA/genome junction of an offspring generation, segregation data and primer information for the respective insertions (7). Since its availability in 2002, the database as well as the interface has been continuously improved. However, during the last 18 months several significant improvements and extensions have been implemented that are beneficial for users that rely on SimpleSearch when working with GABI-Kat insertion alleles. These enhancements include an extended data set, allow an easier and more comfortable user access, and provide more detailed and more *To whom correspondence should be addressed. Tel: +49 521 106 8720; Fax: +49 521 106 6423; Email: [email protected] The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Nucleic Acids Research, 2011, 1–5 doi:10.1093/nar/gkr1047 The Author(s) 2011. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research Advance Access published November 12, 2011 at U niverstsbibliothek B ieefeld on Jauary 6, 2015 http://narrdjournals.org/ D ow nladed from reliable meta-information about the insertion alleles in the GABI-Kat collection. In detail, the recent improvements are (i) an update to the most recent Ath genome annotation dataset TAIRv10, (ii) an improved insertion site prediction and gene hit definition, (iii) enhanced information about confirmation success and line availability. Together with the correction of problematic FST-to-line links using new experimental data, these enhancements result in an increased overall quality and reliability of the collection and its database. RESULTS AND DISCUSSION General database content The SimpleSearch database contains data about GABI-Kat insertion mutants, the derived FSTs and their mapping to the A. thaliana genome sequence. In addition and in contrast to other FST databases like SIGnAL (8) or FLAGdb++ (9), SimpleSearch focuses on metadata concerning GABI-Kat lines and insertion alleles. The database contains data about confirmation attempts for predicted insertion alleles, genetic segregation data of the resistance phenotype provided by the T-DNA allowing an estimation of the number of insertion loci per line, and sequences of successfully confirmed alleles from the offspring generation that allow to determine the real site of insertion much more accurately than the crude FST sequences. Moreover, the user is provided with information about line availability (alleles in dead lines are lost although the respective FST exists in databases), and about insertions that could not be confirmed (failed insertions, which are predicted from FSTs but are not existing in following generations). The updated version of SimpleSearch offers significantly improved data quality due to intensive and ongoing quality management and manual curation (see below). GABI-Kat FSTs are produced by an adaptor-ligation PCR method (4) (see also the Methods and FAQ pages on http://www.gabikat.de/) and are mapped to the A. thaliana genome using BLAST (10). Users can access the FST data, select alleles of interest and place insertion requests. Upon request the GABI-Kat lines are segregated on selective medium, genomic DNA is prepared and the predicted insertion sites are confirmed by PCR and sequencing, using an insertion site specific primer and a T-DNA border primer. The obtained ‘confirmation sequences’ are again mapped to the genome. If the insertion site deduced from confirmation sequencing matches the position predicted from the respective FST, the insertion is regarded as confirmed. Update of the database to TAIRv10 Until recently, the SimpleSearch database (7) was based upon the TIGR version 5 annotation dataset of the A. thaliana genome. The TIGRv5 annotation consisted of individual BAC sequences and contains an outdated set of gene annotation data (11). SimpleSearch has now been updated to the current genome annotation data, namely TAIR version 10, which is based on pseudochromosomes (ftp://ftp.arabidopsis.org/home/tair/home/tair/Sequences/ whole_chromosomes/). In this context all FSTs from the GABI-Kat collection were newly mapped to the genome sequence, and putative insertion positions were deduced. The increase in the number of successfully mapped FSTs results from using optimized mapping parameters, reduced gaps in the genome sequence and also from the generation of new FSTs for fractions of the GABI-Kat population with formerly low FST yield. About 20 000 new FSTs have been submitted to EMBL/ GenBank. These contribute to a total of now 130 000 FSTs at EMBL/GenBank, which are also accessible via SimpleSearch. The reliability of the insertion site prediction was improved by deducing the insertion position from the annealing site of the T-DNA border-specific primer and the nucleotide positions from the original trace-file of the FST (Figure 1) (12,13). Particularly in cases where no T-DNA sequence is detectable in the FST, the prediction of the insertion positions is more accurate now. This advantage is evident when compared to insertion positions deduced from the called and trimmed sequence only (8). However, deletions at the T-DNA border to genome junction still cause errors in the prediction, which can only be resolved by detailed analysis of the confirmation sequences. Nevertheless, we now explicitly predict a defined pseudochromosome position as insertion position, and not only a locus that is described by a gene code or BAC name. For a better assessment of the relevance of a predicted insertion allele, we used data from TAIRv10 to further qualify hits with respect to annotated genes. TAIRv10 contains information about the untranslated region of the mRNA (UTR) for the majority of the protein coding genes, as well as information about RNA-coding genes. Our former definition defined a ‘gene hit’ as an insertion site prediction between 300-bp upstream of the ATG and 300-bp downstream of the stop codon of a gene, whereas a ‘CDSi hit’ had a predicted insertion between ATG and stop codon (CDSi for coding sequence plus Figure 1. Workflow of the improved insertion site prediction. The insertion position is determined using the best BLAST hit of the FST sequence vs. the A. thaliana genome sequence, and the location of the T-DNA within the FST sequence determined by pregap4. If no T-DNA is detected at the start of the FST sequence, the insertion site is located x bases upstream of the BLAST hit, where x is the number of bases before the start of the BLAST hit minus the distance of the T-DNA specific primer to the T-DNA border. Otherwise, the start of the BLAST hit is considered as insertion position. 2 Nucleic Acids Research, 2011 at U niverstsbibliothek B ieefeld on Jauary 6, 2015 http://narrdjournals.org/ D ow nladed from

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Defizite in der Annahme und Erfassung von Kundenbeschwerden in der Versicherungsbranche

Sowohl in der Wissenschaft als auch im Diskurs der Praktiker wird dem Beschwerdemanagement eine hohe Priorität zugestanden. In der Unternehmenspraxis zeigt sich aber die paradoxe Situation, dass Beschwerdemanagement oft nur unzureichend umgesetzt ist. Anhand einer empirischen Studie in der Schweizer Versicherungsbranche zeigen wir beispielhaft aktuelle Defizite in der Annahme und Erfassung von ...

متن کامل

Maschinelle Lernmethoden zur Analyse von Tiling-Array-Daten

Im Rahmen meiner Dissertation [Zel10] entwickelte ich auf Maschinellen Lerntechniken basierende, bioinformatische Methoden, um zur Beantwortung zentraler molekularbiologischer Fragestellungen beizutragen: • In welchen Bereichen des Genoms unterscheiden sich einzelne Individuen derselben Spezies? • Welche Bereiche des Genoms beinhalten Gene, und in welchen Zellen, Organen und Entwicklungsstadien...

متن کامل

Inspire rockt die GEO-Welt-Vom Anwendungsschema zur Web-Anwendung (INSPIRE is rocking the G. I. S. world-From application scheme to web application)

In this document, two different and innovative approaches are presented which support the acquisition and processing of environmental data both in field applications as well as in further processing in a central office. The developed system for biotope mapping is provided and maintained as a clientserver application. The developed system fetures a multi tier architecture with clear differentiat...

متن کامل

Gewinnung von Wetterdaten und deren Nutzung für die landwirtschaftliche, gärtnerische und obsttaugliche Beratung

Das agrarmeteorologische Messnetz der Thüringer Landesanstalt für Landwirtschaft (TLL) dient der Erfassung, Aufbereitung und Bereitstellung von Daten, die für die Anwendung umweltverträglicher Verfahren im Pflanzen-, Obstund Gemüsebau, sowie beim Anbau von Sonderkulturen erforderlich sind. Der Aufbau des Messnetzes begann 1992. Es besteht gegenwärtig (Stand Dezember 2004) aus 16 Stationen. Das ...

متن کامل

Beratung von Milchvieh haltenden Betrieben auf der Grundlage von Verhaltens und Erscheinungsparametern ihrer Milchkühe

Zur standardisierten Beurteilung der Haltungsbedingungen in Liegeboxenlaufställen für Milchkühe wurde eine Schwachstellenanalyse („On-Farm Welfare Assessment“) entwickelt, die mit Hilfe tierbezogener Indikatoren Schwachstellen in den Bereichen Haltung und Management aufdeckt. Dabei basiert die Schwachstellenanalyse auf einem Expertensystem, das den Vergleich des Einzelbetriebes mit definierten ...

متن کامل

Characterization of forest understory using multi-temporal full-waveform airborne laser scanning

Zusammenfassung: Der Unterwuchs als Teil der Waldstruktur hat eine wichtige Funktion im Hinblick auf die Dynamik der Waldentwicklung. Allerdings ist die Charakterisierung des Unterwuchses mittels Fernerkundungsmethoden problematisch, da die Vegetationsdichte eine Erfassung der vertikalen Struktur stark limitiert. Unter Verwendung von flugzeuggestütztem, multi-temporalen Laserscanning ist es mög...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015